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Changing institutions is an integral part of an academic life. Yet little is known about the mobility patterns 
of scientists at an institutional level and how these career choices affect scientific outcomes. Here, we 
examine over 420,000 papers, to track the affiliation information of individual scientists, allowing us to 
reconstruct their career trajectories over decades. We find that career movements are not only temporally 
and spatially localized, but also characterized by a high degree of stratification in institutional ranking. 
When cross-group movement occurs, we find that while going from elite to lower- rank institutions on 
average associates with modest decrease in scientific performance, transitioning into elite institutions does 
not result in subsequent performance gain. These results offer empirical evidence on institutional level 
career choices and movements and have potential implications for science policy. 



Despite their importance for education, scientific productivity, reward and hiring procedures, our quant- 
itative understandings of how individuals make career moves and relocate to new institutions, and how 
such moves shape and affect performance, remains limited. Indeed, previous research on migration 
patterns of scientists''^ tended to focus on large-scale surveys on country-level movements, revealing long-term 
cultural and economical priorities'"^. At a much finer scale, research on human dynamics and mobility has 
emerged as an active line of enquiry^"'^ owing to new and increasingly available massive datasets providing time 
resolved individual trajectories'^. While these studies cover a much shorter time scale than a typical career, they 
uncover a set of regularities and reproducible patterns behind human movements^ '^ '^. Less is known about 
patterns behind career moves at an institutional level and how these moves affect individual performance. 

Here we take advantage of the fact that scientists publish somewhat regularly along their career'^'^, and for each 
publication, the institution in which the work was performed is listed as an affiliation in the paper, documenting 
career trajectories at a fine scale and in great detail. These digital traces, offering data on not only individual 
scientific output at each institution but also career moves from one institution to another, can provide insights for 
science policy, helping us understand how institutions shape knowledge, the typical moves of individual career 
development and help us evaluate scientific outcomes associated with professional mobility. 

We use the Physical Review dataset to extract mobility information, publication record, and citations for 
individual scientists. The data consists of 237,038 physicists and 425,369 scientific papers, out of which 4,052 
different institutions are extracted after the disambiguation process for authors and affiliations (see SM for 
disambiguation process). To reconstruct the career trajectory of a scientist, we use the affiliation given in each 
of his/her publications (Fig 1). For authors with multiple affiliations listed on a paper we consider the first 
affiliation as primary institution. We compute the impact of each paper by counting its cumulative citations 
collected 5 years after its publication'^"^'. 



Results 

Three characteristics are computed for each institution / (Fig. 2): the institution size (A/), representing the total 
number of distinct authors that published at least one paper at institution /; the number of papers (Pi) published 
under affiliation /; the cumulative number of citations (Q) collected by all papers P^. We find that P(A) follows a 
fat tailed distribution, indicating significant population heterogeneity among different institutions (Fig. 2a). 
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Figure 1 | Illustrative example of career trajectory reconstruction for hypothetical authors. Given the paper N° 1 and AT 2, we know that the author John 
J. Smith was affiliated to Northeastern University in 1 963 and Harvard University in 1988. Extracting information from all his other publications allows us 
to reconstruct his career trajectory and discover that he was affiliated to Northeastern University for 8 years where he pubUshed 5 papers and then moved 
to Harvard University for 23 years where he published 16 papers. The cumulative number of citations of a paper obtained within 5 years after the 
publication is also known. 



While most institutions are small, a few have a large number of 
scientists, often corresponding to large institutes or universities. 
We observe similar disparity in P(C) (Fig. 2b): few institutions 
acquire a large number of citations, while most research labs or 
universities receive few citations. 

Figures 2c-d show the correlation between the institution size A 
and both the average publications impact C/P and the average pro- 
ductivity PI A of institutions. The average productivity and impact of 
an institution are different but complementary measures of scientific 
performance. We find the institution size has little influence on 
productivity {R^ = 0.43) (Fig. 2d), yet it positively correlates with 
the impact of pubhcations {R^ = 0.85), indicating that large institu- 
tions offer a more innovative/higher impact environment than smal- 
ler ones as captured by citations per paper (Fig. 2c). Also, as larger 
institutions have more internal collaborations, the number of co- 
authors in publications from large institutions might be larger and, 
as a consequence, attracts more citations^^. 

Many institutions are small with few citations, hence they account 
for very small portion of the data. For the rest of the paper, we will 
focus on the thousand most cited institutions, accounting for more 
than 99% of papers. They correspond to institutions with at least 698 
citations within the APS data over the 120-year period (shaded area 
in Fig. 2). 

Mobility is often important in furthering a professional career^. In 
science, the best lab for the type of research you are doing is usually 
not where you are^^"^^. Nowadays changing countries is a rite of 
passage for many young researchers who follow the resources and 
facilities^'^^. As the patterns and characteristics of these migrations 
are blurry, we need to systematically study the mobility of scientists. 
Thanks to the large disambiguated data spanning the last 120 years 
that we have compiled, a systematic study of scientific mobility is 
now possible. 

The strong correlations between the three quantities (A, P, C) 
indicate any of the three could characterise an institution, serving 
as a proxy of its ranking against others. Here, we choose C (the total 
number of citations) as our parameter to approximate the ranking by 



reputation. Other parameters such as the h-index of an institution or 
the number of papers P could also be used^^"^^. But the results should 
be insensitive to this choice owing to good correlations between these 
quantities {R^ = 0.96 and R^ = 0.92 respectively). The top-ranked 
institutions all correspond to well-known universities or research 
labs with long tradition of excellence in physics (Fig. 3), corroborat- 
ing our hypothesis that C is a reasonable proxy for ranking. We can 
also observe the similarity and stability of other rankings when com- 
paring with other metrics. 

We focus on authors with similar career longevity, restricting our 
corpus to those who began their career between 1950 and 1980 and 
published for at least 20 years without any interruption exceeding 5 
years. Following these criteria, we arrived at a subset of 2,725 scien- 
tists to study the mobility patterns and their impact on their careers. 
A total of 5,915 career movements are recorded for this corpus. 

In Figure 4a we select three individuals as exemplary career his- 
tories. Each line represents one individual, with circles denoting his/ 
her publications, allowing us to observe his/her location. The size of 
the circle is proportional to citations the paper acquires in five years, 
approximating the impact of the work. By studying the whole corpus, 
we compute P(m), the probability for a scientist to have visited m 
different institutions along his career (Fig. 4c), finding that career 
movements are common but infrequent: Only 14% of them never 
moved at all (m = 1). For the ones that move, they mostly move once 
or twice, P(m) decaying quickly as m increases. We also compute 
P(0, the probability to observe a movement at time t, where ^ = 0 
corresponds to the date of the scientist's first publication. We find 
that most movements occurred in the early stage of the career 
(Fig. 4b), supporting the hypothesis that changing affiliations is a 
rite of passage for young researchers^. This likely corresponds to the 
postdoc period where graduates broaden their horizons through 
mobility. This may also reflect the increasing cost of relocation and 
family constraints as family developed^'^. A third characteristic is the 
geographical distance of movements. Ad. Existing literature hints for 
somewhat competing hypothesis in the role geography plays in 
career movements. Indeed, research on human mobility suggests that 
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Figure 2 | Basic features of research institutions, (a) The probabihty density function of institution size, A, follows a fat tailed distribution, indicating a 
significant heterogeneity. While most institutions size are small, a few have a large population, often representing large institutes or universities with a 
long history, (b) The probability density function of citations of institutions, C, is also very heterogeneous. Few institutions acquired a large number of 
citations, while most research labs or universities received few citations. Only the first thousand locations are taken into account in further analyses 
(shaded area), (c) The correlation between institution size and average publication impact is reported. Institution size positively correlates with the 
impact of publications {F^ = 0.9), indicating that large institutions offer a more innovative/higher impact environment than smaller ones as captured by 
citations per paper. The dashed line indicates a power-law behaviour with exponent a = 0.204 ± 0.006 (d) The correlation between institution size and 
institution average productivity is also reported, indicating institution size has little influence on productivity = 0.43). The dashed Une indicates a 
power-law behaviour with exponent a = 0.037 ± 0.003. 



regular human movements mostly cover short distances with occa- 
sional longer trips, characterized by a power law distance distri- 
bution^'^'^°'^^; in contrast, country-level surveys find increasing 
cross-country movements mostly due to cultural exposure and life 
quality concerns, indicating potential dominance in long distance 
moves in career choices comparing with typical human tra- 
^g^g 1-3,5,29-31 measure the distance distribution over all moves 
observed in our dataset, finding that our result is supported by a 
combination of both hypothesis. We find the probability to move 
to further locations decays as a power law^^'^^ whereas the null model 
predicts this probability to be flat (Fig. 4d). This observation is con- 
sistent with studies on human mobility, that short distance moves 
dominate career choices. Yet, when comparing the power law expo- 
nents, we find the exponent characterizing career moves (y = 0.65 ± 
0.053) is much smaller than those observed in human travel (y ~ 2), 
corresponding to higher likelihood of observing long range move- 
ments. This observation might be explained by the influence that 
scientific collaborations can have on career movements as similar 
low exponents are observed for collaboration network between 
cities^^. 

Taken together, the preceding results indicate that career moves 
mostly happen during the early stage of a career and are more likely 
to cover short distances. The observed location in both time and 
space raises the question of how individual moves as a function of 
institutional rankings. To this end, denoting with Ti^j the number of 
transitions from the institution of rank / to the one of rank j, we 



measure P(/, j), the probability to have a transition from rank / to 
rank j as 

P(M)=j5i{-^ (.) 
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Figure 3 | Ten most cited institutions in physics. Comparison between 
different rankings. The H-index is closely related to the number of citations 
as we can observe. Top-ranked institutions all correspond to well-known 
universities or research labs with long tradition of excellence in physics, 
corroborating our hypothesis that C is a reasonable proxy for ranking. 
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Figure 4 | Basic features of scientists career, (a) Illustration of three scientific trajectories based on publications where each line corresponds to one 
scientist and each publication is represented by a circle whose size is proportional to its number of citations cumulated within 5 years after its publication. 
The institutions are ranked according to the total number of citations they obtained (see Methods), 1 being the most cited institution, (b) The probability 
density function of movement according to time, P( t) , shows that most movements occurred in the early stage of the career. This likely corresponds to the 
postdoc period where graduates broaden their horizons through mobility, (c) The probability density function of number of visited institutions 
for a scientist along his career, P(m), indicates that career movements are common but infrequent. Scientists mostly move once or twice, P(m) decaying 
quickly as m increases, (d) The probability density function of distance of movements, P(Ad)y has a fat-tail that can be fitted by a power law with an 
exponent y = 0.65 ± 0.053, whereas the null model predicts this probability to be roughly flat. 



Interestingly, we find that most movements involve elite institutions 
(rank is small), and transitions between bottom institutions are rare 
(Fig. 5a). This is due to the fact that elite institutions are characterised 
by larger populations, hence translating into more events. 

To account for the population based heterogeneity, we compare 
the observed P(i,j) with the probability P"""(/, j) expected in a ran- 
dom model where we randomly shuffle the transitions from insti- 
tution / to j while preserving the total number of transitions from and 
to each institution. Formally, in this null model, we have 

P"""a;)=E^(^'i)-E^(''')' (2) 

k I 

and we compare P(/, j) with the null model by computing the matrix 

k I 



M{i,j) is the ratio between the probability P(i,j) to have a transition 
from rank / to; divided by the probability P"""(/, j) when the move- 
ments are shuffled, measuring the likelihood for a move to take place 
by accounting for the size of the institutions. Hence, M(/, j) = 1 
indicates the amount of observed movements is about what one 
would expect if movements were random. Similarly, M(i, j) > 1 
indicates that we observe more transitions from / to j than we 
expected, whereas M(i, j) < 1 corresponds to transitions that are 
underrepresented. We find that career moves are characterized by 
a high degree of stratification in institutional rankings (Fig. 5b). 
Indeed, we observe two distinct clubs (red spots in Fig. 5b), indicating 
that the overrepresented movements are the ones within elite institu- 
tions (lower-left corner) or within lower-rank institutions (upper- 
right corner), and scientists belonging to one of the two groups tend 
to move to institutions within the same group. On the other hand, 
both upper-left and lower- right corners are colored blue, indicating 
cross group movements (transitions from elite to lower-rank 
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institutions and vice-versa) are significantly underrepresented. Also, 
scientists from medium-ranked institutions move to the next insti- 
tution with a probability that is indistinguishable from the random 
case. In other words, their movements indicate no bias towards mid- 
dle, elite or lower- ranked institutions. 

The high intensity of stratification in career movements raises an 
interesting question: how does individual performance in science 
relate to their moves across different institutional rankings? 

To answer this question, we need to quantify the performance 
change for each individual before and after the move. Imagine that 
a scientist moves from / to j, and pubUshed n papers at location / and 
m papers at j. The impact of a paper k can be approximated by c^, the 
number of citations cumulated within 5 years after its publica- 



tion'' 



Let c 



. } and c+ = {c+ , • • • ,4 } 



the lists of number of citations for papers pubHshed before (c") 
and after (c^) the transition from / to; (T/ j). To quantify the change 
in performance, we introduce 



Ac* = - 



(4) 



where c+ and are the average of and c", respectively, and 
corresponds to the standard deviation of the concatenation of both 
and c~ while preserving the moment when the movement took 
place (see SM for more information about a^). Therefore, Ac* cap- 
tures the statistical difference in the average citations between papers 
published before and after the movement normalized by the random 
expectation when the same author's publications were shuffled. A 
positive Ac* indicates papers following the move on average result in 
higher citation impact, hence representing an improvement in sci- 
entific performance. A negative value corresponds to a decline in 
performance. 

To quantify the infiuence of movements on individual perform- 
ance, we divide all movements into two categories based on the 
performance change: movements associated with positive and nega- 
tive Ac*, and measure M(/,;| Ac* > 0) and M(/,;| Ac* < 0). We find 
the observed stratification in career moves is robust against indi- 
vidual performance (Fig. 5c-d). That is, the two clubs emerge for 
both categories in a similar fashion as in Figure 5b, indicating the 
pattern of moving within elite or lower-rank institutions is nearly 
universal for people whose performance is improved or decreased 
following the move. Comparing Figure 5c and Figure 5d, we find the 



red spot in lower-left corner is more concentrated in Figure 5d than 
in Figure 5c, hinting that being more mobile in the space of rankings 
may lead to variable performance. To test this hypothesis, for each 
transition Ti^j we calculate the rank difference between the origin and 
destination (Ar^ = / — j). 

A positive value of Ar^ indicates / > j, hence a movement to a 
lower- rank institution, whereas Ar^ < 0 corresponds to transitions 
into institutions with a higher rank. In Figure 6 we measure the 
relation between Ac* and Ar. When scientists move to institutions 
with a lower rank (Ar > 0), we find that their average change in 
performance is negative, corresponding to a decline in the impact of 
their work. Yet, what is particularly interesting lies in the Ar < 0 
regime. Indeed, when people move from lower rank location to elite 
institutions, we observe no performance change on average. This is 
rather unexpected, as transitioning from lower- rank institutions to 
elite institutions is thought to provide better access to ideas and lab 
resources, which in turn should fuel scientific productivity. A pos- 
sible explanation may be that scientist who have the opportunity to 
make big jumps in the ranking space may have already had an excel- 
lent performance in their previous institutions. A move therefore will 
not affect their impact. 

Discussion 

In summary, we extracted affiliation information from the publica- 
tions of each scientist, allowing us to reconstruct their career moves 
between different institutions as well as the body of work published at 
each location. We find career movements are common yet infre- 
quent. Most people move only once or twice, and usually in the early 
stage of their career. Career movements are affected by geography. 
The distance covered by the move can be approximated with a power 
law distribution, indicating that most movements are local and mov- 
ing to faraway locations is less probable. We also observe a high 
degree of stratification in career movements. People from elite insti- 
tutions are more likely to move to other elite institutions, whereas 
people from lower rank institutions are more likely to move to places 
with similar ranks. We further confirm that the observed stratifica- 
tion is robust against the change in individual performance before 
and after the move. When cross-group movement occurs, we find 
that while going from elite to lower- rank institutions on average 
results in a modest decrease in scientific impact, transitioning into 
elite institutions, does not result in gain in impact. 
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Figure 6 | Impact of movements on career performance. The relation 
between the statistical difference of citations (Ac*) and the ranking 
difference (Ar) associated to a transition shows that, when people move to 
institutions with a lower rank (Ar > 0), their average change in 
performance is negative, corresponding to a decline in the impact of their 
work. Yet, what is particularly interesting lies in the Ar < 0 regime. Indeed, 
when people move from lower rank location to elite institutions, we 
observe no performance change on average. 

The nature of our dataset restricted our study on a sample of 
scientists. As a result of this selection process, our results are biased 
towards physicists from 1960s to 1980s with high career longevity. 
Yet, these limitations also suggest new avenues for further investi- 
gations. Indeed, as datasets become more comprehensive and of 
higher resolution, newly available data sources like Web of Science 
or Google Scholar can provide new and deeper insights towards 
generalization of the results across different disciplines, temporal 
trends, and more. Further investigations regarding the influence of 
career longevity on scientific mobility should also be considered as it 
could reveal as well results of importance. Taken together our results 
offer the first systematic empirical evidence on how career moves 
affect scientific performance and impact. 

Method 

Dataset. The data provided by the American Physical Society (APS) contains over 
450,000 publications, each identified with a unique number, corresponding to all 
papers published in 9 different journals, namely Physical Review A, B, C, D, E, I, L, ST 
and Review of Modern Physics, spanning a period of 1 17 years from 1893 to 2010. For 
each paper the dataset includes title, date of publication (day,month,year), author 
names and affiliations of each of the authors. A separate dataset also provides list of 
citations within the APS data only, using unique paper identifiers. About 5% of 
publications with ambiguous author- affiliation links or massively authored were 
removed from this dataset (see SM for more details). 

Author Name Disambiguation. To derive individual information, one has to 
reconnect papers belonging to a single scientist. Since no unique author identifier is 
present in the data, author names must be disambiguated. The dataset contains about 
1,2 millions of author-paper pairs. To overcome the ambiguities present in the data, 
we design a procedure that uses information about the author but also metadata about 
the paper such as coauthors and citations. By computing similarities between authors, 
our procedure can successfully detect single authors as well as homonymies (see SM 
for more details about the disambiguation method). A total of 237,038 distinct 
scientists are detected by our method. 

Affiliation Disambiguation. A major disadvantage when dealing with publication 
data is the inconsistencies and errors associated with affiliation names on papers. A 
total of 319,829 different affiliation names are identified in the dataset. The 
disambiguation procedure for affiliations uses geocoded information as well as a 
similarity measure between affiliation names in order to disambiguate institutions. 
The disambiguated set of authors also plays a crucial role in the procedure (see SM for 
more details about the disambiguation method). A total of 4,052 distinct institutions 
are identified by our algorithm. 

Resolving individual career trajectory. Based on the information present in the 
publications of a scientist, we can reconstruct his/her career trajectory. In order to 
detect career movements, i.e. changes in a scientist's institution, one has to remove 



artificial movements induced by short-term stays and by errors and typos in the 
affiliation names on the papers. To do so, only institutions reported in at least two 
consecutive papers are considered in a career trajectory. 

Ranking the institutions. Three variables are considered to rank an institution: (i) 
the total number of papers, P„ published with institution /, (ii) the cumulated number 
of citations, Q, corresponding to institution /, (iii) the h-index, H„ of institution i. The 

Pi 

variable is defined as Q = Ck where is the number of citations within the 

k=i 

APS data of paper k cumulated within 5 years after its publications. An institution has 
an h-index H if H of its P papers have at least H citations each, and the other (P - H) 
papers have no more than H citations each. H for papers indicates the cumulative 
number of citations obtained within 5 years after the publication. 

Binning the institutions. About 6,000 transitions between 1,000 institutions are 
detected for our subset of scientists. In order to have a statistically significant number 
of transitions to derive the values of P{i,j) and M{i,j) (Fig. 5), institutions are binned 
logarithmically according to their rank (r) into five groups. 
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